Day24- 資料處理模組-Pandas-Series 基礎

2019鐵人賽 day24

wayneli

2018-11-07 23:18:15

5083 瀏覽

分享至

Series主要處理一維度資料，組成包含Index與Value．

以下程式碼皆以引入Pandas模組

import pandas as pd

Series建立

Value

Series的資料來源(value)可透過Array(List、range)、Dict、單一資料來建立，我們依序來看看不同型態建立出的Series．

Array(List、range)

預設索引為數字從0開始

#list
data = [1, 2, 3, 4, 5, 6]
s = pd.Series(data)
print(s)

#輸出：
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

#range
data = range(1,7)
s = pd.Series(data)
print(s)

#輸出：
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

Dict

預設的索引會是Dict的key


#dict
data = {'one':1, 'two':2, 'tree':3, 'four':4}
s = pd.Series(data)
print(s)

#輸出：
one     1
two     2
tree    3
four    4
dtype: int64

單一資料

單筆資料輸出為一筆

data = '字串'
s = pd.Series(data)
print(s)

#輸出：
0    字串
dtype: object

若要將資料重複建立Series可指定索引

data = '字串'
s = pd.Series(data, index = range(0,5))
print(s)

#輸出：
0    字串
1    字串
2    字串
3    字串
4    字串
dtype: object

Index

我們可以透過不同的資料型態來建立Series的資料，除了預設的索引類型與Dict的key當作索引外，Pandas中也有提供自訂索引值得方式，操作如下：

自訂索引從11開始，注意自訂索引數量需與資料相符

#range
data = range(1,7)
s = pd.Series(data, index = range(11,17))
print(s)

#輸出：
11    1
12    2
13    3
14    4
15    5
16    6
dtype: int64

透過日期區間建立日期索引

#range
data = range(1,7)
dateIndex = pd.date_range('20181101', periods=6)
s = pd.Series(data, index = dateIndex)
print(s)

#輸出：
2018-11-01    1
2018-11-02    2
2018-11-03    3
2018-11-04    4
2018-11-05    5
2018-11-06    6
Freq: D, dtype: int64

也能透過官方文件設定間隔區間，預設是D表示天，指定B表示business day frequency只取工作日，如下範例：

#range
data = range(1,7)
dateIndex = pd.date_range('20181101', periods=6, freq='B')
s = pd.Series(data, index = dateIndex)
print(s)

#輸出：
2018-11-01    1
2018-11-02    2
2018-11-05    3
2018-11-06    4
2018-11-07    5
2018-11-08    6
Freq: D, dtype: int64

Series取用

基本的取用方式與list相同，透過[]內放置索引值就能取值，如下：

data = range(1,7)
s = pd.Series(data)
print(s[3])

#輸出：4

當然Series還有許多較進階的取用方式也是較常使用的，如下：

loc：透過索引尋找

* python 中括弧內:有從...到...的意思

data = range(1,7)
s = pd.Series(data)
print(s.loc[2:5])

#輸出：
2    3
3    4
4    5
dtype: int64

上述程式碼取出從索引2開始到索引4的資料，也就是所謂的資料切片方式，冒號左側空白代表從0開始，右端空白代表取到最後，如下：

data = range(1,7)
s = pd.Series(data)
print(s.loc[2:])

#輸出：
2    3
3    4
4    5
5    6
dtype: int64

print(s.loc[:4])

#輸出：
0    1
1    2
2    3
3    4
4    5
dtype: int64

iloc：透過整數索引尋找

不管我們使用的是自訂的索引或是預設索引，在Series中還是有隱含的整數索引，所以我們還是可以透過iloc指定整數索引取值，如下：

data = {'one':1, 'two':2, 'tree':3, 'four':4}
s = pd.Series(data)
print(s.loc['one'])

#輸出：
1

print(s.iloc[0])

#輸出：
1

資料切片的方式與loc相同，唯一不同的是變成不包含後面索引位置，如下範例：

data = range(1,7)
s = pd.Series(data)
#loc包含索引5
print(s.loc[2:5])

#輸出：
2    3
3    4
4    5
5    6
dtype: int64

#iloc不包含索引5
print(s.iloc[2:5])

#輸出：
2    3
3    4
4    5
dtype: int64

以上，我們就學會了Series的基本使用了．

文章內容如果有錯誤歡迎留言告知，可以幫忙糾正錯誤的觀念，感謝！